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In this paper, we consider regression models with a Hilbert-space- 
valued predictor and a scalar response, where the response depends 
on the predictor only through a finite number of projections. The 
linear subspace spanned by these projections is called the effective 
dimension reduction (EDR) space. To determine the dimensionality 
of the EDR space, we focus on the leading principal component scores 
of the predictor, and propose two sequential \ 2 testing procedures un- 
der the assumption that the predictor has an elliptically contoured 
distribution. We further extend these procedures and introduce a test 
that simultaneously takes into account a large number of principal 
component scores. The proposed procedures are supported by the- 
ory, validated by simulation studies, and illustrated by a real-data 
example. Our methods and theory are applicable to functional data 
and high-dimensional multivariate data. 

1. Introduction. Li (1991) considered a regression model in which a 
scalar response depends on a multivariate predictor through an unknown 
number of linear projections, where the linear space spanned by the direc- 
tions of the projections was named the effective dimension reduction (EDR) 
space of the model. Li (1991) introduced a % 2 test to determine the dimen- 
sion of the EDR space, and an estimation procedure, sliced inverse regression 
(SIR), to estimate the EDR space. Li's results focused on the case where p, 
the dimension of the predictor, is much smaller than n, the sample size. It is 
not obvious how to extend his results to high-dimensional multivariate data 
where p is comparable to or larger than n; see Remark 5.4 in Li (1991). 
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Regression problems for functional data have drawn a lot of attention 
recently. In particular, regression models in which the predictor is func- 
tional while the response is scalar have been extensively investigated; for 
linear models, see Cardot, Ferraty and Sarda (2003), Ramsay and Silver- 
man (2005), Cai and Hall (2006) and Crambes, Kneip and Sarda (2009); 
for nonlinear models, see Hastie and Tibshirani (1990), Cardot and Sarda 
(2005), James and Silverman (2005) and Miiller and Stadtmiiller (2005). 
Ferre and Yao (2003, 2005) extended SIR to a functional-data setting, and 
showed that the EDR space can be consistently estimated under regularity 
conditions provided that the true dimension of the space is known; see also 
Forzani and Cook (2007) and Ferre and Yao (2007). However, deciding the 
dimensionality of the EDR space is much more challenging in that case, and 
there has not been a formal procedure to date. 

In this paper, we address the problem of deciding the dimensionality of 
the EDR space for both functional and high-dimensional multivariate data. 
As in Ferre and Yao (2003), we adopt the framework where the predictor 
takes value in an arbitrary Hilbert space. To better control the sample infor- 
mation in the (high-dimensional) predictor, we focus on the sample principal 
component scores rather than the raw data. Since the leading principal com- 
ponent scores optimally explain the variability in the predictor, it is natural 
to expect that the leading sample principal component scores also offer the 
most relevant information for the inference problem. Two statistical tests will 
be developed for testing whether the dimension of the EDR space is larger 
than a prescribed value; an estimator of the dimension of the EDR space 
will then be obtained by sequentially performing the tests developed. We 
will assume that the Hilbert-space- valued predictor has an elliptically con- 
toured distribution, a common assumption for inverse regression problems 
[cf. Cook and Weisberg (1991), Schott (1994) and Ferre and Yao (2003)]. 
These tests will be first developed by focusing on a fixed number of princi- 
pal component scores; it will be shown that the null distributions of the test 
statistics are asymptotically x 2 ■ To address high and infinite-dimensional 
data, we propose an "adaptive Neyman" test, which combines the infor- 
mation in a sequence of x 2 tests corresponding to an increasing number of 
principal component scores. 

We introduce the background and notation in Section 2. The main the- 
oretical results and test/estimation procedures are described in Section 3. 
Simulation studies are presented in Section 4, and a real application on near- 
infrared spectrum data is presented in Section 5. Finally, all of the proofs 
are collected in Appendix. 

2. Model assumptions and preliminaries. Let {X(t),t £ J?} be a real- 
valued stochastic process with an index set J^. Assume that ¥(X 6 = 1, 
where Jt? is some Hilbert space containing functions on J 1 and equipped 
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with inner product (•,•)■ We do not place any restriction on J? and so X 
can be extremely general. For instance, for multivariate data J* is a finite 
set, with p elements, say, and J$? can then be taken as MP equipped with 
the usual dot product; in functional data analysis, Jtf? is commonly assumed 
to be L 2 (J) for some bounded interval <y, with inner product (g,h) = 



where Y is scalar, / is an arbitrary function, /3i,...,($k are linearly inde- 
pendent elements in Jff, and e is a random error independent of X. Assume 
that f,K,(3i,...,/3K are all unknown, and we observe a random sample 
(Xi,Yi), l<i<n, which are i.i.d. This is similar to the the setting of Ferre 
and Yao (2003, 2005). Following Li (1991), we call /3 1 ,...,/3 K the EDR di- 
rections, and span(/?i, . . . ,/3k) the EDR space. Without fixing /, the EDR 
directions are not identifiable; however, the EDR space is identifiable. The 
focus of this paper is the estimation of the dimension, K, of the EDR space. 

We assume that the X^s are observed at each t £ J? . For functional data, 
this is an idealized assumption as no functions on a continuum can be fully 
observed. However, it is a reasonable approximation for densely observed 
smooth data, for which the Tecator data discussed in Section 5 is a good 
example. In that situation, for most theoretical and practical purposes, one 
can fit continuous curves to the discrete-time data and then treat the fitted 
curves as the true functional data; see, for example, Hall, Mxiller and Wang 
(2006), Cai and Hall (2006) and Zhang and Chen (2007). The case of sparsely 
observed functional data requires more attention and will not be studied in 
this paper. It may also be of interest to study the case where X contains 
measurement error; see (a) of Section 3.3. 

2.1. Principal components. First, we focus on the generic process X. 
Denote the mean functions \i of X by fi(t) = E{X(t)}, t S . The covariance 
operator of X is the linear operator Fx '■= E((X — fj,) ® (X — fx)), where, for 
any h S Jtif, h(& h is the linear operator that maps any g £ J4? to (h,g)h. It 
can be seen that Tx is a well-defined compact operator so long as E(||JT || 4 ) < 
oo, which we assume throughout the paper; see Eubank and Hsing (2010) 
for the mathematical details in constructing fi and Tx- Then there exist 
nonnegative real numbers U\ > u)% > • • • , where ^ • u>j < oo, and orthonormal 
functions ipi,ip2i ■ ■ ■ m ^ such that Txtfjj = Wjipj for all j; namely, the cjj's 
are the eigenvalues and tp^s the corresponding eigenfunctions of Tx- The 
ipj's are commonly referred to as the principal components of X. It follows 
that 




(2.2) 




j 
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and 



(2.3) 



where the £j's are zero- mean, uncorrelated random variables with Var(£j) = 
cjj, and the rjj's are standardized £,-'s. Call m the standardized jth principal 
component score of X . The representations in (2.2) and (2.3) are commonly 
referred to as the principal component decomposition and the Karhunen- 
Loeve expansion, respectively; see Ash and Gardner (1975) and Eubank and 
Hsing (2010) for details. 

In view of (2.1) and (2.3), any component of that is in the orthogonal 
complement of the span of the tpj is not estimable. As explained above, 
this paper does not address the estimation of the Thus, assume without 
generality that the /3fc's are spanned by the ipj's and write 



where, for simplicity, the constants //),..., {/3k, fj) are absorbed by /. 
For the i.i.d. sample (Xi,Yi), . . . ,(X n ,Y n ), let rjij be the standardized jth 
principal component score of Xi, and write 



2.2. Elliptically contoured distributions. As mentioned in Section 1, the 
relevance of elliptical symmetry is evident in the inference of (2.1). We devote 
this subsection to a brief introduction of the notion of elliptically contoured 
distribution for Hilbert-space-valued variables. 

Let X be as defined in Section 2.1. By the assumption E(||A|| 4 ) < oo, 
the distribution of X is determined by the (marginal) distributions of the 
random variables (h,X),h £ Jrff. Say that X has an elliptically contoured 
distribution if 



(2.4) 




By (2.3) and (2.4), (j3y.,X) = (/3fe,/i) + ^jbkji]ji an d (2.1) can be re-expressed 
as 




(2.5) 




(2.6) 



E(e i ( h ' x -ri) = cl)((h,m)) 



for some some function eft on K and self-adjoint, nonnegative operator E. Re- 
call that X is said to be a Gaussian process if (h, X) is normally distributed 



DECIDING THE DIMENSION OF EFFECTIVE DIMENSION REDUCTION SPACE5 



for any h G Jff, and so (2.6) holds with 4>{t) = exp(— 1/2) and £ = Tx- How- 
ever, (2.6) in general describes a much larger class of distributions. 

The mathematics necessary to characterize elliptically contoured distri- 
butions was worked out in Schoenberg (1938); see Cambanis, Huang and 
Simons (1981) and Li (2007). It follows that definition (2.6) implies that 
£ is a constant multiple of Tx and (p(t 2 ) is a characteristic function. More 
explicitly, (2.6) leads to the characterization 

(2.7) X-n = QX, 

where G and X are independent, is a nonnegative random variable with 
E(0 2 ) = 1 and X has the same covariance operator as X; if X G M. p and 

rank(rx) = k > 1 then X = 0^4 pX fc£4xi where AA T = Fx and U is uni- 
formly distributed on the ^-dimensional sphere with radius VX; if rank(rx) = 
oo then X is necessarily a zero- mean Gaussian process. Recall that Ukxi is 
asymptotically Gaussian [cf. Spruill (2007)] and so the infinite-dimensional 
representation can be viewed as the limit of the finite-dimensional one. 

2.3. Functional inverse regression. To introduce functional inverse re- 
gression, we first state some conditions: 

(CI) E(||X|| 4 )<oo. 

(C2) For any function b G Jff, there exist some constants cq,...,ck such 
that 

E((b,X)\(fa,X),...,(p K ,X)) = c + c 1 (pi,X) + '-' + CK(PK,X). 

(C3) X has an elliptically contoured distribution; namely, (2.7) holds. 

Conditions (C1)-(C3) are standard conditions in the inverse regression lit- 
erature; see, for instance, Ferre and Yao (2003, 2005). As mentioned earlier, 
condition (CI) guarantees the principal decomposition; moreover, it also 
ensures the convergence rate of ri" 1 / 2 in the estimation of the eigenvalues 
and eigenspaces of Tx based on an i.i.d. sample X\, . . . , X n ; see Dauxois, 
Pousse and Romain (1982). Condition (C2) is a direct extension of (3.1) 
in Li (1991) which addresses multivariate data. If X is a Gaussian process, 
then projections of X are jointly normal, from which (C2) follows easily. 
Condition (C3) describes a broader class of processes satisfying (C2) than 
the Gaussian process; for convenience (C3) is often assumed in lieu of (C2). 

Call the collection {E(X(t)\Y),t G J^} of random variables the inverse 
regression process and denote its covariance operator by T x \y- We will use 
the notation Im(T), for any operator T, to denote the range of T. The 
following result, first appeared in Ferre and Yao (2003), is a straightforward 
extension of Theorem 3.1 of Li (1991). 
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Theorem 2.1. Under (CI) and (G2), Im(T x \ Y ) c span(r x £i, ... ,T X /3 K ). 

Theorem 2.1 implies that span(Tx/3i, ■ • ■ , r^As:) contains all of the eigen- 
functions that correspond to the nonzero eigenvalues of T x \y ■ Consequently, 
if r x \y nas K nonzero eigenvalues, then the space spanned by the eigenfunc- 
tions is precisely span(Tx/3i, . . . ,T x [3k)- In that case, one can in principle 
estimate span(/?i, . . . , 13k) through estimating both T x and T x \y ■ This forms 
the basis for the estimation of the EDR space [cf. Li (1991) and Ferre and 
Yao (2003, 2005)]. 

While T x \y is finite-dimensional under (CI) and (C2), if Jrf? is infinite- 
dimensional then its definition still involves infinite-dimensional random 
functions. In order to implement any inference procedure, we consider a 
finite-dimensional adaptation using principal components. 

Let m be any positive integer, where m<n—l and, if X is p-dimensional, 
m < p. Define b fej(m ) = (b k i, ■ ■ ■ , b km )' T , ?7 i (m) = (rjn, . . . , rj im ) T and s ik = 
^2j >m Vijbkj- Then (2.5) can be expressed as 

(2.8) Yi = /(b^ (m) T7j )(m) + Qi, . . . ^K,{m)^i,{m) + SiK,£i)- 

If one regards the f]i^ m ) as predictors and combine the q& with £j to form the 
error, then (2.8) bears considerable similarity with the multivariate model 
of Li (1991). One fundamental difference is that although the Q k are mi- 
correlated with rji( m \, they might not be independent of r]^^, unless X is 
Gaussian. Another major difference is that we do not directly observe TJi^ m ^ 
so that this model might be viewed as a variation of the errors-in-variables 
model in Carroll and Li (1992). Our estimator for K will be motivated by 
the finite-dimensional model (2.8). The details of the procedure, including 
the role of m, will be explained in Section 3. To pave the way for that, we 
briefly discuss the inference of the b fc ( m ) below. 
We first need to estimate rji^ m y Let 

n n 

X = n- 1 Xi and f x = n~ x ^{X, -X)® (X t - X) 

i=l i=l 

be the sample mean function and the sample covariance operator, respec- 
tively. Let ujj and ipj be the jth sample eigenvalue and eigenfunction of Y x . 
By Dauxois, Pousse and Romain (1982), Qj and ipj are root-n consistent un- 
der (CI). The standardized jth principal component scores of Xi are then 

estimated by % =uJ 1/2 (-tpj,Xi - X); let %( m ) = ■ ■ ■ ,mm) T ■ 

Based on the "data" (jjum^jYi), 1 <i < n, the usual sliced inverse regres- 
sion (SIR) algorithm can be carried out as follows. Partition the range of Y 
into disjoint intervals, Sh, h = 1, . . . ,H, where ph '■= P(Y £ Sh) > for all h. 
Define 

(2.9) & jth = E( Vj \Y e S h ), h)(ro) = E( V{m) \Y e S h ) = (0 1>fc , . . . , $ m , h ) T 
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and 

(2.10) n h = E(X|y G S h ) = fi + J2^/%h^j- 

3 

Let V( m ) = YlhPh'&h,(m,)'&'h (m) ^he between-slice covariance matrix. 
In the finite-dimensional model (2.8) with (^u, . . . ,SiK,£i) playing the role 
of error, V( m ) is the sliced-inverse-regression covariance matrix. The eigen- 
vectors of Vt m \ corresponding to the nonzero eigenvalues are contained in 
span(b 1) ( m ), . . . , b^( m )). The matrix Vi m ) is estimated by the corresponding 
sample version V {m) = T,h=iPh^h,(m)(^h,(m)) T , where 



(2.11) 



Ph = n h /n, n h = 2_^ I(Yi G S h ) and 



= — J^ViXm) 1 ^ G S h)- 

Uh i=l 

Letting bw^j), . . . , b^r m ) be the first K eigenvectors of V( m ), the estimators 
of /3fc's are given by 

m 

%(t) =Y,"J 1/2 bk3$j(t), k = l,...,K. 

3=1 

In order for span(/3i, . . . ,/3k) to consistently estimate the EDR space, it is 
necessary that span(/zi — fj,, . . . , fi^ — fj,) have the same dimension as the EDR 
space, and that m tends to oo with n in some manner. However, first and 
foremost, we must know K beforehand, which makes the determination of 
K a fundamental issue. 

The matrix Vt m ) w iU be our basis for deciding K. Here, we define some 
notation related to Vr m ) and V( m ) for future use. For any m x 1 vector u, let 
t / u = /-uu T ; let g = {gi, ■ ■ ■ ,gn) = {p{ /2 , ■ ■ ■ ,p]i 2 ), and g = (gi, . . . ,g H ) = 



Ph ) 


. Define 










M = 


[$1,(771); ■ • 


■ i^ir,(m)Lxfli 


G 


= diag{flf 1 ,.. 


■■■>9h) 


F = 




5 {m) = MF, 








M = 


[#l,(m)> ■ • 


■ i^,(m)]mxHi 


G 


= diag{5i,.. 


■,9h) 


F = 




B (ro) = MF, 









where ~&h,(m) and ^hAm) are defined in (2.9) and (2.11), respectively. Thus, 
the inverse-regression covariance matrices V( m ) and V( m ) can be rewritten 
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as 

( 2 - 12 ) V (m) = B ( m ) B fm), V (m)=B{m)B]rny 

3. Deciding the dimension of EDR space. As explained in previous sec- 
tions, we are particularly interested in functional data or high-dimensional 
multivariate data. Existing methods for deciding the dimensionality of EDR 
space in the multivariate setting [Li (1991), Schott (1994)] are not directly 
applicable to the types of data that are focused on in this paper. Ferre and 
Yao (2003, 2005) used a graphical approach to determine the number of 
EDR directions for functional data but a formal statistical procedure has 
been lacking. 

Our approach is generically described as follows. To decide the dimension 
of the EDR space, as in Li (1991), we will conduct sequential testing of 
Hq-.K < Kq versus H a : K > Kq for Kq = 0,1,2,...; we will stop at the first 
instance Kq = K when the test fails to reject Hq and declare K as the true 
dimension. Below, we consider two types of tests in the sequential testing 
procedure motivated by (2.8). In Section 3.1, we assume that m is fixed, 
while in Section 3.2 we consider m in a wide range. 

3.1. Chi-squared tests based on a fixed m. Fix an m and focus on the 
between-slice inverse covariance matrix Vt m \, which has dimension m x m; 
recall that it only makes sense to consider m such that m < n — 1 and, if X 
is a p-dimensional vector, m<p. Define 

K(m) =rank(V( m )). 

Clearly, if( m ) < K for all m. It is desirable to pick an m such that Kr m ) 
f\ . Note that this condition means that the projections of all of the EDR 
directions onto the space spanned by the first m principal components are 
linearly independent, which is very different from saying that all of the EDR 
directions are completely in the span of the first m principle components; see 
the examples in Section 4. However, picking an m to guarantee Kr m \ = K 
before analyzing the data is clearly not always possible. A practical approach 
is to simply pick an m such that the first m principal components explain 
a large proportion, say, 95%, of the total variation in the X^s. Such an 
approach will work for most real-world applications. Still, keeping m fixed 
has its limitations. We will address them in more detail in future sections. 

In the following, let A,(M) denotes the jth largest eigenvalue of a 
nonnegative-definite square matrix M. Under Hq:K < Kq, we have 
A/f _|_i(V( m )) = • • • = A m (V( m )) = 0. Consider the test statistic 

m 

j=K +l 
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Since Vt m \ estimates Vi m \, large values of ^K ,(m) wm support the rejection 
of Hq. The following theorem provides the asymptotic distribution of £?K ,(m) 
under Hq. For the convenience of the proofs, we will assume below that the 
positive eigenvalues of Tx are all distinct. 

The following addresses the case where X is a Gaussian process. 

Theorem 3.1. Suppose that (CI) holds and X is a Gaussian process. 
Assume that K < Kq, and let H > Kq + 1 and m > Kq + 1. Denote by a 
random variable having a x 2 distribution with (m — Kq)(H — Kq — 1) degrees 
of freedom. 

(i) If K ( m ) = K o, then 

(3.2) ^K ,(m) $E asn— »oo. 

(ii) If K( m \ < Kq, then ^K ,(m) ^ s asymptotically stochastically bounded 
by & ; namely, 

limsupP(5^ 0i(rn) > x) < P( > x) for all x. 

n—¥oo 

Theorem 3.1 suggests a x 2 test for testing Hq : K < Kq versus H a : K > Kq, 
which is an extension of a test in Li (1991) for multivariate data. Ideally, 
case (i) holds and the x 2 test has the correct size asymptotically, as n — > oo . 
For a variety of reasons case (ii) may be true, for which the x 2 test will 
be conservative. This point will be illustrated graphically by a simulation 
example in Figure 1 in Section 4. 

The proof of Theorem 3.1 is highly nontrivial, which goes considerably 
beyond the scope of the multivariate counterpart. A theoretical result that 
is needed to establish (ii) of Theorem 3.1 appears to be new and is stated 
here. 

Proposition 3.2. Let Z be a px q random matrix and we write Z = 
\Z\\Z<i\ where Z\ and Z2 have sizes p x r and p x (q — r), respectively, for 
some < r < min(p, q). Assume that Z\ and Z2 are independent, and Z2 
contains i.i.d. Normal (0, 1) entries. Then YTj= r +i ^j(ZZ T ) is stochastically 
bounded by x 2 with (jp — r)(q — r) degrees of freedom. 

The case where Z is a matrix of i.i.d. Normal(0, 1) entries can be viewed 
as the special case, r = 0, in Proposition 3.2. In that case, the bound is the 
exact distribution since Y^j=i ^j(ZZ T ) equals the sum of squares of all of 
the entries of Z and is therefore distributed as x 2 with pq degrees of freedom. 

Next, we address the scenario where X is elliptically contoured but not 
necessarily Gaussian. Let 

(3.3) T h = E(e 2 \YeS h ), h=l,...,H. 
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If if( m ) = Kq, then it can be seen from the proofs in the Appendix that 

H-Ko-l 

(3.4) ^K ,(m) — > ^2 h&k asn->-oo, 

k=l 

where =^'s are distributed as i.i.d. x 2 with m — Kq degrees of freedom, and 
Si,..., 5h-k -i are the eigenvalues of AHA, with H = ^g{I — BTJB^ x 

BT -)) _ -B(m)}^g an d A = diag(r 1 1/ ' 2 , . . . , r^/ 2 ). If X is Gaussian, then r^'s 
and (5fc's are identically equal to 1. In general, the limiting null distribution 
in (3.4) depends on the unknown parameters 5k- Cook (1998) suggested 
carrying out this type of test by simulating the critical regions based on 
the estimated values of these parameters. Below, we introduce a different 
approach by adjusting the test statistic so that the limiting distribution is 
free of nuisance parameters. 

Under Hq : K < Kq, let m > Kq and &2 be the matrix whose columns are 
the eigenvectors that correspond to the m — Kq smallest eigenvalues of V( m ) . 
The definition (3.3) suggests (see proof of Theorem 3.3 in the Appendix) that 
Tfi can be estimated by 

1 { - ~ " 

?h = i tF~\ — tr i wE^h-^m) 

(3.5) 

Put A = diag(r 1 1 ^ 2 , . . . , t^ 2 ), A = diag^ 1 ^ 2 , . . . ,t^ 2 ), and define 

W (m) = B (m) A(A £ / g A)-, E (m) = W [m) W? m) , 

W {m) = B (m) A(A,/gA)-, S (m) = W( m )W^), 

where A~ denotes the Moore-Penrose generalized inverse of the matrix A. 
By Lemma 3 below, £( m ) has the same null space as V( m y Thus, under 
Hq:K < Kq, we have Ax +i(£(m)) = • • • = A m (£( m )) = 0. We therefore pro- 
pose the test statistic 

m 

j=K +l 

where, again, large values of support the rejection of Hq. The follow- 

ing result extends (i) of Theorem 3.1 from the case where X(t) is Gaussian 
to a general elliptically contoured process. While we conjecture that (ii) of 
Theorem 3.1 can be similarly extended, we have not been able to prove it. 
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Theorem 3.3. Suppose that (CI) and (C3) hold. Assume that the true 
dimension K < Kq and let H > Kq + 1 and m > Kq + 1 . If Kr m ) = Kq then 

^K Q ,{m) y ^m-Ko){H-K -l) as n -)• oo. 

The test of Hq : K < Kq based on ^K ,(m) an d ^k (m) anc ^ ^he asvm Ptotic 
null distribution, x^ m _ Ko ^ H _ Ko _ l y will be referred to as the x 2 test and 
the adjusted x 2 test, respectively. 

3.2. Adaptive Neyman tests. So far we considered tests based on a fixed 
m. In most situations, in practice, choosing the smallest m for which the first 
m sample principal components explain most of the variations should work 
fairly well for determining K. However, for functional or high-dimensional 
multivariate data, one cannot theoretically rule out the possibility that the 
EDR directions can only be detected by examining the information con- 
tained in higher-order principal components. 

A careful inspection reveals two different issues here. The first is the ques- 
tion that if we have an unusual model in which some EDR directions depend 
only on high-order principal components that the data have little power to 
discern, can any approach be effective in detecting the presence of those 
directions? The answer is "not likely" since, intuitively, we can detect the 
presence of those EDR directions no better than we can the principle compo- 
nents that comprise the directions. This is due more to the nature of high or 
infinite-dimensional data than the limitation of any methodology. However, 
keep in mind that principal components are not ordinary covariates, but are 
mathematical constructs which not only depend on the covariance function 
of X but also the choice of inner product of Jtf. Thus, one can argue that 
having an EDR direction that is orthogonal to a large number of low-order 
principal components of the predictor is itself a rather artificial scenario and 
is not likely to be the case in practice. 

Let us now turn to the second issue. Assume that all of the EDR direc- 
tions do contain low-order principal components which can be estimated well 
from data. For example, suppose each EDR direction is not in the orthogonal 
complement of the space spanned by the first three principal components 
and so the procedures described in Section 3.1 will in principle work if we let 
m = 3. However, since that knowledge is not available when we conduct data 
analysis, to be sure perhaps we might consider picking a much larger trun- 
cation point, say, m = 30. The problem with this approach is that, when the 
sample size is fixed, the power of the tests will decrease with m. Intuitively, 
when m isjarge the information contained in the individual components of 
&h,(m) = {^i,hi ■ ■ ■ )^m,h) T becomes diluted. We will illustrate this point nu- 
merically in Section 4.1. This is strikingly similar to the situation of testing 
whether the mean of a high-dimensional normal random vector is nonzero 



12 



Y. LI AND T. HSING 



described at the beginning of Section 2 of Fan and Lin (1998), where the 
power of the Neyman test (likelihood-ratio test) was shown to converge to 
the size of the test as the number of dimension increases. Essentially, the 
problem that they describe is caused by the fact that the Neyman test has 
a rejection region that is symmetric in all components of the vector, which 
is designed to treat all possible alternatives uniformly. Fan and Lin (1998) 
argued that the alternatives that are of the most importance in practice are 
usually those in which the leading components of the Gaussian mean vector 
are nonzero, and they modified the Neyman test accordingly such that the 
test will have satisfactory powers for those alternatives. 

We now introduce a test inspired by Fan and Lin (1998) that avoids having 
to pick a specific m. To test Hq : K < Kq against H a : K > Kq, we compute 
^K ,(m) f° r all of m = Kq + 1, . . . , N, for some "large" N; we then take the 
maximum of the standardized versions of these test statistics, and the null 
hypothesis will be rejected for large values of the maximum. To facilitate 
this approach, we present the following result that is a deeper version of 
Theorem 3.1 and shows that the test statistics ^K ,(m) has a "partial sum" 
structure in m as the sample size tends to oo. 

Theorem 3.4. Suppose that (CI) holds and X is a Gaussian process. 
Assume that K < Kq and let H > Kq + 1. Let > 1, be i.i.d. \ 2 random 
variables with H — Kq — 1 degrees of freedom and define 3£< m \ = Yli=i Ko Xi, m > 
Kq + 1. Then, for all positive integers N > Kq, the collection of test statistics 
^K ,(m) i m = Kq + 1, . . . , N, are jointly stochastically bounded by 3£< m \ , m = 
Kq + 1,...,N, as the sample size n tends to oo. 

In view of Theorem 3.4, we propose the following. To test Hq : K < Kq 
versus H a : K > Kq , define 

a , ? Ko>{m) -(m-K )(H-K -l) 
uk ,n '■= max 



K +l<m<N yj2(m- Kq){H - Kq - 1) 
and we reject Hq at level a if ^k ,n > Ua where u a is the 1 — a quantile of 

& im) -(m-K ){H-K -l) 



>K ,N 



max 



K +l<m<N y/2(m- K )(H -Kq-1) 

The resulting test resembles asymptotically the aforementioned test in Fan 
and Lin (1998) which was referred to as an adaptive Neyman test. For con- 
venience, we will also refer to our test as adaptive Neyman test, although 
the contexts of the two problems are completely unrelated. 

Suppose that Hq holds and mK Q is the smallest m such that K( m ) = Kq . 

Then, by Theorem 3.1, ^Ko,n - ^K ,m Ko -i @K ,N ~ @K ,m Ko -l- Thus, 
3$K ,N is, intuitively, a tight asymptotic stochastic bound for ^k ,n- 
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Simulation results show that the maximum in the definition of &k ,n is, 
with high probability, attained at relatively small m's. Thus, the test is quite 
robust with respect to the choice of N. In practice, one can pick TV so that 
there is virtually just noise beyond the iVth sample principal component. 
Numerically, the performance of the adaptive Neyman test matches those of 
the x 2 tests in which m is chosen correctly, but does not have the weakness 
of possibly under-selecting m. 

3.3. Discussion. 

(a) Our procedures apply to both finite-dimensional and infinite-dimensional 
data, and, in particular, are useful for treating high-dimensional mul- 
tivariate data. In that case, Li's \ 2 test suffers from the problem of 
diminishing power as does the test developed in Schott (1994); see, for 
example, Table 4 in Schott (1994). Our procedures can potentially be a 
viable solution in overcoming the power loss problem in that situation. 
The inclusion of measurement error in X provides additional flexibility 
in modeling multivariate data. Note that the formulation of Theorem 2.1 
can be extended to accommodate measurement error: if X = X\ + X2 
where X\ is the true covariate with mean [i and covariance matrix 
Txi and X2 is independent measurement error with mean zero, then 
E(X\Y)=K(X 1 \Y) and so lm(T x \ Y ) C span(r Xl /3i, . . . ,T Xi Pk)- Thus, 
one might speculate that our procedures continue to work in that case, 
and this is borne out by simulations presented in Section 4.3. Detailed 
theoretical investigation of this is a topic of future work, but prelimi- 
nary indications are that the extension of Theorem 2.1 is valid at least 
under the additional assumption that the components of X2 are i.i.d. 
with finite variance. 

(b) Choice of slices: in the SIR literature, the prevailing view is that the 
choice of slices is of secondary importance. In our simulation studies, 
we used contiguous slices containing roughly the same number of Yi's, 
where the number of Y^s per slice that we experimented with ranged 
from 25 to 65. Within this range, we found that the number of data per 
slice indeed had a negligible effect on the estimation of K. 

(c) Choice of a: if a is fixed and m and N are chosen sensibly in the \ 2 
tests and the adaptive Neyman test, respectively, then the asymptotic 
results show that the probability of correct identification of K tends to 
1 — a as n tends to 00. In real-data applications, the optimal choice of 
a depends on a number of factors including the sample size and the 
true model. In our simulation studies, presented in Section 4, a = 0.05 
worked well for all of our settings. 

(d) Limitations of SIR: the failure of SIR in estimating the EDR space 
in situations where Y depends on X in a symmetric manner is well 



14 



Y. LI AND T. HSING 



documented. While exact symmetry is not a highly probable scenario 
in practice, it does represent an imperfection of SIR which has been 
addressed by a number of other methods including SAVE in Cook and 
Weisberg (1991), MAVE in Xia et al. (2002) and integral transform 
methods in Zhu and Zeng (2006, 2008). The estimation of K based on 
those approaches will be a topic of future research. 



4. Simulation studies. 



4.1. Simulation 1: Sizes and power of the tests. In this study, we consider 
functional data generated from the process 

oo oo 

X(t) = ^fc-i^fc-iV^ cos(2fc-7rt) + y^^ll 2 r]2kV^sm(2kirt), 

k=l k=l 

where lu^ = 20(k + 1.5) -3 . Thus, the principal components are the sine and 
cosine curves in the sum. We will consider the cases where the rj^'s follow 
Normal (0, 1) and centered multivariate t distribution with v = 5 degrees of 
freedom, with the latter representing the situation where X is an elliptically 
contoured process. Note that the centered multivariate t distribution with 
v degrees of freedom can be represented by 



t = Zj y^r/(u — 2) ~ t v where Z ~ iV(0, /) and r ~ xt are independent. 

To simulate {771 , 772 , • • •} in that case, we first simulate z%, Z2 ■ ■ ■ ~ i.i.d. Normal(0, 
l)j r ~ Xu> an d then put rjk = z^j \Jt j(y — 2). By this construction, any fi- 
nite collection of the %'s follows a multivariate t distribution, where the 
r/fc's are mutually uncorrelated but not independent. 
Let the EDR space be generated by the functions 

/3i(t) = 0.9\/2cos(27rt) + 1.2\/2cos(47rt) 

- 0.5-\/2cos(87rf) + V — - — — cos(2kirt), 

fe>4 ( 2fc - X ) 

/3 2 (t) = -0.4\/2sin(27rt) + 1.5^2 sin(47rf) - 0.3\/2sin(67rt) 

f — l) k \/2 

+ 0.2\/2sin(87rt) + ^ V ' ' 3 sin(2/cvri), 
fc>4 \ ' 

p 3 (t) = \/2cos(2vrt) + \/2sin(47rt) + 0.5\/2cos(67ri) + 0.5\/2sin(87rt) 

+ E (4^)3 cos i 2 ( 2k - i)**} + E j^kj sin ( 4 ^)- 



(4.1) 
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Consider the models 

Model 1: Y = 1 + 2sin((/3i, X)) + e, 

Model 2:Y = (p u X) x (2((3 2 ,X) + 1) + e, 

Model 3:Y = 5((3 1 ,X)x(2((3 2 ,X) + l)/(l + (p 3 ,X) 2 ) + e, 

where e ~ Normal(0, 0.5 2 ). The EDR spaces of the three models have dimen- 
sions K = 1,2 and 3, respectively. Also note that Kr m \ = K if m > 1, 2 and 
3, respectively, for the three models. 

In each of 1000 simulation runs, data were simulated from models 1-3 
for the two distributional scenarios that X is distributed as Gaussian and 
t with two sample sizes n = 200 and 500. To mimic real applications, we 
assumed that each curve Xi is observed at 501 equally-spaced points. We 
then registered the curves using 100 Fourier basis functions. A functional 
principal component analysis was carried out using the package fda in R 
contributed by Jim Ramsay. 

To decide the dimension of the EDR space, we compared the two proposed 
X 2 tests and the adaptive Neyman test. For the x 2 tests, we let m = 5, 7 and 
30, where the first 5, 7 and 30 principal components of X, respectively, 
account for 91%, 95% and 99.59% of the total variation. We present the 
results for m = 30 as an extreme case to illustrate the point that using a 
large number of principal components will cause the tests to have lower 
powers. For the adaptive Neyman test, we took N = Kq + 30 and simulated 
the critical values for ^k ,n based on the description following Theorem 
3.4. We only report the results based on H = 8 slices, but the choice was 
not crucial. The nominal size of the tests was set to be a = 0.05. 

The simulation results are briefly discussed below. Table 1 gives the em- 
pirical frequencies of rejecting Hq : K < 1. Since the dimension of EDR space 
under model 1 is equal to 1, the results in the column under model 1 give 
the empirical sizes of the tests. Models 2 and 3 represent two cases under 
the alternative hypothesis, therefore the results in those columns give the 
power of the tests. As can be seen, when m is 5 or 7, the two x 2 tests have 
sizes close to the nominal size and have high powers. However, for the case 
m = 30 and n = 200, the tests performed significantly worse in those met- 
rics. On the other hand the adaptive Neyman test performs very stably, with 
powers comparable to those of x 2 tests with a well chosen m. It is also worth 
noting that when X has the t distribution, the x 2 test performs comparably 
to, sometimes better than, the adjusted x 2 test, showing that the x 2 test is 
quite robust against departure from normality. 

Table 2 shows that the empirical frequencies of finding the true dimensions 
for different situations. As can be expected, estimating the true dimension 
becomes more challenging as the model becomes more complicated. For 
example, the probabilities of finding the true dimension for model 3 are 
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Table 1 

Empirical frequencies of rejecting the hypothesis Ho ■ K < 1. The results are based on 
1000 simulations for each of the three models, two distributions of process X{t), and two 
sample sizes. The \ t es t an d the adjusted \ are applied with fixed m values, and the 
adaptive Neyman test is applied with N — Ko + 30 



Model 1 Model 2 Model 3 





Distribution of rj 


Normal 


t 


Normal 


t 


Normal 


t 


n = 200 


X 2 test (m = 5) 
Adj. x 2 (m = 5) 


0.040 
0.070 


0.046 
0.082 


0.883 
0.894 


0.567 
0.584 


0.995 
0.997 


0.949 
0.954 




X 2 test (m = 7) 
Adj. x 2 (m = 7) 


0.045 
0.071 


0.039 
0.083 


0.827 
0.868 


0.496 
0.537 


0.982 
0.989 


0.910 
0.924 




X 2 test (m = 30) 
Adj. x 2 (m = 30) 


0.034 
0.105 


0.020 
0.072 


0.320 
0.484 


0.111 
0.299 


0.677 
0.798 


0.493 
0.668 




Adaptive Neyman 


0.044 


0.045 


0.860 


0.545 


0.993 


0.936 


n = 500 


X 2 test (m = 5) 
Adj. x 2 (m = 5) 


0.052 
0.069 


0.043 
0.056 


1.000 
1.000 


0.977 
0.969 


1.000 
1.000 


1.000 
1.000 




X 2 test (m = 7) 
Adj. x 2 (m = 7) 


0.048 
0.056 


0.033 
0.049 


1.000 
1.000 


0.958 
0.949 


1.000 
1.000 


0.999 
0.999 




X 2 test (m = 30) 
Adj. x 2 (m = 30) 


0.059 
0.085 


0.025 
0.054 


0.958 
0.972 


0.584 
0.666 


0.999 
0.999 


0.990 
0.991 




Adaptive Neyman 


0.052 


0.040 


1.000 


0.963 


1.000 


0.999 


much smaller than those for 


model 2. 


Our 


simulation 


results 


also show that, 



for a range of small values of m, the two x 2 procedures perform very well 
especially if n = 500, where for brevity those results are represented by m = 5 
and 7 in Table 2. However, when m = 30, the probabilities of finding the true 
dimension become smaller for those procedures, which is especially true for 
models 2 and 3. This is another illustration that using a large number of 
principal components will lead to a loss of power for the underlying \ 2 tests. 
Again, the adaptive Neyman procedure performs comparably to the two x 2 
procedures with a well-chosen m. 

4.2. Simulation 2: Sensitivity to the truncation point m. In the exam- 
ples in Section 4.1, the three EDR directions are linearly independent when 
projected onto the three leading principal components. Hence, the two x 2 
procedures are expected to work so long as m > 3. For situations where one 
or more EDR directions only depend on high-order principal components, 
the choice of m in the x 2 procedure is crucial and the adaptive Neyman 
procedure has a clear advantage. 

To illustrate this, we consider a new model 

Model 4: Y = {(3 1 ,X) x (2(/3 4 , X) + 1) + e, 
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Table 2 

Empirical frequencies of finding the true dimension of the EDR space. The results are 
based on 1000 simulations for each of the three models, two distributions of process X(t), 
and two sample sizes. The x i^st and the adjusted \ are applied with fixed m values, 
and the adaptive Neyman test is applied with N = Kq + 30 



Model 1 Model 2 Model 3 





Distribution of 77 


Normal 


t 


Normal 


t 


Normal 


t 


n = 200 


X 2 test (m = 5) 
Adj. x 2 (m = 5) 


0.960 
0.930 


0.954 
0.918 


0.859 
0.846 


0.562 
0.556 


0.322 
0.393 


0.165 
0.224 




X 2 test (m = 7) 
Adj. x 2 (m = 7) 


0.955 
0.929 


0.961 
0.917 


0.809 
0.832 


0.488 
0.505 


0.272 
0.313 


0.116 
0.167 




X 2 test (m = 30) 
Adj. x 2 (m = 30) 


0.966 
0.895 


0.971 
0.924 


0.309 
0.462 


0.111 
0.277 


0.057 
0.119 


0.026 
0.079 




Adaptive Neyman 


0.956 


0.955 


0.843 


0.542 


0.337 


0.158 


n = 500 


X 2 test (m = 5) 
Adj. x 2 (m = 5) 


0.948 
0.931 


0.957 
0.944 


0.955 
0.948 


0.958 
0.910 


0.842 
0.849 


0.629 
0.648 




X 2 test (m = 7) 
Adj. x 2 (m = 7) 


0.952 
0.944 


0.967 
0.951 


0.959 
0.949 


0.948 
0.898 


0.739 
0.754 


0.489 
0.551 




X 2 test (m = 30) 
Adj. x 2 (m = 30) 


0.941 
0.915 


0.975 
0.946 


0.913 
0.910 


0.582 
0.625 


0.279 
0.308 


0.101 
0.203 




Adaptive Neyman 


0.948 


0.960 


0.967 


0.952 


0.843 


0.613 



where X is a Gaussian process whose distribution is as described in Section 
4.1, (3i is as in (4.1), but $4 is given by 

/3 4 (t) = 0.45\/2 cos(27rt) + 0.6v / 2cos(4vrt) - 3v^sin(67rf) 

(—l) k \/2 

+ 1.2\/2sin(87rt) + ^ V ' ' 3 sin(2/c7rt). 

fc>4 ^ ' 

In this model, the dimension of the EDR space is 2, but the projections of 
Pi and /?4 onto the first five principal components are linearly dependent; 
indeed, i£7 m ) = l,m < 5, and Kr m \ = 2,m > 6. As shown in Table 3, the 
two x 2 procedures with m = 5 both failed to find the true dimension, even 
when n = 500. On the other hand, when n = 500 and m = 7, the x 2 proce- 
dures worked very well. With m = 30, both x 2 tests again have considerably 
lower powers, which leads to smaller probabilities of correct identification. 
As in the previous models, the adaptive Neyman procedure has comparable 
performance to the best x 2 procedures. 

Finally, we use model 4 to illustrate the null distribution of &Ko,(m) an d 
^K (m) wnen K(m) < Ko- Consider Kq = 2; for each m = 4, 5, . . . , compute 
the expected values of &K ,(m) an d t m \ by simulations and compare 
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Table 3 

Empirical frequencies of finding the correct model in model 

4 



n = 200 n = 500 



X 2 test (m = 5) 
Adj. x 2 (m = 5) 


0.040 


0.047 


0.068 


0.068 


X 2 test (m = 7) 
Adj. x 2 (m = 7) 


0.358 


0.913 


0.410 


0.899 


X 2 test (m = 30) 
Adj. x 2 (m = 30) 


0.085 


0.566 


0.170 


0.616 


Adaptive Neyman 


0.229 


0.885 



the expectations with theoretical expectations (m — Kq)(H — Kq — 1). The 
results are described in Figure 1, in which the grey rectangles mark the 
means of 3^K ,{m)i the black circles mark the means of and straight 

line represents (m — Kq)(H — Kq — 1). The case m > 6 correspond to (i) of 
Theorem 3.1, for which xf m _K )(H-K -i) * s ^ e asymptotic distribution of 
the test statistics; the cases m = 4 and 5 correspond to (ii) of Theorem 3.1, 
for which xf m _K )(H-K -i) 1S on ^ a stochastic bound of the test statistics. 
Both of these points are clearly reflected in Figure 1. 

4.3. Simulation 3: Multivariate data with measurement errors. In this 
subsection, we present a simulation study for high-dimensional multivari- 
ate data; in particular, we use the study to support claims made in (a) of 




1 5 6 7 8 9 10 



Fig. 1. The expected values of the test statistics S^K ,(m) an d 3~k (m) plotted is a func- 
tion of m; the solid line describes the theoretical expected values, the rectangles are the 
means of !?K Q ,{m) an d the circles are the means of 2?k q (m)- 
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Section 3.3. Assume that X is multivariate, and, for clarity, denote X by 
X here. The simulated model is described as follows. We first generate a p- 
dimensional variable X = (X^Xj ) T , where p is "large," with Xi denoting 
a 10-dimensional random vector while X2 = O p _io, so that Xi contains the 
real signal in X. Suppose that Xi has a low-dimensional representation 

5 

fc=i 

where the t/^'s are orthonormal vectors, ~ Normal(0, oof.) are independent, 
and (cji, . . . , W5) = (3, 2.8, 2.6, 2.4, 2.2); the are randomly generated, but 
are fixed throughout the simulation study. Furthermore, instead of observing 
the true X, assume that we observe an error-prone surrogate, 

w = x + u, 

where U ~ Normal (0, I p ) is measurement error. Thus, the eigenvalues of 
the covariance of W are bounded below by 1. Note that this is a simpler 
measurement-error model than the one considered in Carroll and Li (1992), 
but realistically portrays certain crucial aspects of high-dimensional data en- 
countered in practice; for example, in a typical fMRI study the total number 
of brain voxels captured by the image is huge but often only a relatively small 
portion of the voxels are active for the task being studied, while background 
noise is ubiquitous. 

Let X\, . . . , X p be the components of X. Consider the model 

Model 5: Y = {X l + X 2 )/(X 2 + X 3 +X 4 + X 5 + 1.5) 2 + e, 

where e ~ Normal(0, 0.5 2 ). Thus, Y only depends on Xi. Below we compare 
the x 2 procedure in Li (1991) and the adaptive Neyman procedure using W 
as the observed covariate. 

We conducted simulations for n = 200, 500 and p = 15, 20, 40, 100. For each 
setting, we repeat the simulation for 1000 times and used Li's procedure and 
the adaptive Neyman procedure in deciding the number of EDR directions. 
For both procedures, the nominal size a = 0.05 was used. In Table 4, we 

Table 4 





Empirical frequencies 


of finding the 


correct model 


in model 5 








p= 15 


p = 20 


p = 40 


p= 100 


n = 200 


Li's x 2 test 


0.328 


0.258 


0.123 


0.007 




Adaptive Neyman 


0.588 


0.596 


0.562 


0.528 


n = 500 


Li's x 2 test 


0.898 


0.833 


0.612 


0.276 




Adaptive Neyman 


0.955 


0.956 


0.953 


0.960 



20 



Y. LI AND T. HSING 



Table 5 

Empirical sizes of Li 's \ 2 test and the adaptive Neyman test for Ho : K < 2 under model 5 







p = 15 


p = 20 


p = 40 


p= 100 


n = 200 


Li's x 2 test 


0.015 


0.012 


0.002 


0.001 




Adaptive Neyman 


0.016 


0.013 


0.009 


0.010 


n = 500 


Li's x 2 test 


0.044 


0.042 


0.025 


0.013 




Adaptive Neyman 


0.037 


0.035 


0.035 


0.030 



summarize the empirical frequencies of finding the correct dimension. As can 
be seen, while the performance of the adaptive Neyman procedure is quite 
stable for different p's, the performance of Li's procedure deteriorates as p 
increases. In Table 5, we also present the true sizes, obtained by simulations, 
of the two tests for Hq :K<2. In all cases both tests have sizes under 0.05. 
The sizes of both tests are closer to the nominal size when n = 500 than 
when n = 200. With a fixed n, the sizes of Li's test decrease quickly as p 
increases, reflecting the conservative nature of the test for large p, while 
those for the adaptive Neyman test remain relatively stable. 

5. Data analysis. In this section, we consider the Tecator data [Thodberg 
(1996)], which can be downloaded at http:/ /lib. stat.cmu.edu/datasets/tecator 
The data were previously analyzed in a number of papers including Ferre 
and Yao (2005), Amato, Antoniadis and De Feis (2006) and Hsing and Ren 
(2009). The data contains measurements obtained by analyzing 215 meat 
samples, where for each sample a 100-channel, near-infrared spectrum was 
obtained by a spectrometer, and the water, fat and protein contents were also 
directly measured. The spectral data can be naturally considered as func- 
tional data, and we are interested in building a regression model to predict 
the fat content from the spectrum. Following the convention in the litera- 
ture, we applied a logistic transformation to the percentage of fat content, 
U, by letting Y = log 10 {C//(l - U)}. 

In applying functional SIR, both Ferre and Yao (2005) and Amato, An- 
toniadis and De Feis (2006) used graphical tools to select the number EDR 
directions, where the numbers of directions selected were 10 and 8, respec- 
tively. On the other hand, using only two EDR directions, Amato, Antoniadis 
and De Feis (2006) applied MAVE to achieve a prediction error comparable 
to what can be achieved by SIR using 8 directions. These conclusions were 
somewhat inconsistent. 

Based on the instructions given by the Tecator website, we used the first 
172 samples for training and the last 43 for testing. Following Amato, An- 
toniadis and De Feis (2006), we focused on the most informative part of 
the spectra, with wavelengths ranging from 902 to 1028 nm. The curves are 
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Tecator Data EDR directions 




" 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 -1.5 -1.0 -0.5 0.0 

t t V 

Fig. 2. Tecator spectrum data: the first plot shows the Tecator spectrum data in the 
training set, the second plot show the estimated EDR directions and the last plot is the 
predicted vs. true fat contents for the test data set. 



rescaled onto the interval [0, 1] . The first plot in Figure 2 shows those spectra 
in the training set. 

We first fitted B-splines to the discrete data, and then applied our 
sequential-testing procedures. With a = 0.05, the adaptive Neyman pro- 
cedure concluded that K = 3. To see how well a three-dimensional model 
works, the model Y = f({{3±,X}, (f32,X), (/?3,X)) + e was entertained. The 
EDR directions were estimated by the regularized approach, RSIR, intro- 
duced by Zhong et al. (2005); the estimated EDR directions are presented 
in the center plot in Figure 2. Finally, the link function / was estimated by 
smoothing spline ANOVA [Gu (2002)] with interaction terms; the estimated 
model was then applied to test data to predict fat content. The root mean 
prediction error was 0.062 which is comparable to what was obtained by 
MAVE in Amato, Antoniadis and De Feis (2006). The plot of the predicted 
versus the true fat contents for test data is also given in Figure 2. 

Our result is in agreement with what was obtained using MAVE in Amato, 
Antoniadis and De Feis (2006) in that a low-dimensional model is appropri- 
ate for this data set. 

APPENDIX: PROOFS 

In the following, the notation "V refers to symbolic matrix multiplication; 
for instance, if /i, . . . , fk are mathematical objects (functions, matrices, etc.) 
and c = (ci, . . . , Cfc) T is a vector for which the operation J2i=i °ifi * s defined, 
we will denote the sum by (/i, . . . , fk) * c; also if C is a matrix containing 
columns ci, . . . , ci, the notation (/i, . . . , fk)*C refers the array [(fi, ■ ■ ■ , fk)* 

Cl,...,(/l,...,/*:)*C^. 

The notation is defined in Section 2 and will be used extensively below 
without further mention. 
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A.l. Proof of Theorem 3.1. Recall from (2.12) that V {m) = B {m ^Bj m) 

and V( m ) = B^ m ^Bj m y Thus, to study the eigenvalues of V( m ) and V( m ), we 

can equivalently study the singular values of -B( m ) and £>( m ), respectively. 
Recall that if( m ) = Rank(.B( m )). Under the hypothesis K < Kq, we also have 
-f^(m) ^ Kq. Suppose -B( m ) has the following singular- value decomposition: 

5 M = ^(o o)^' 

1 /2 1 /2 

where D := diag(A x (V( m )), . . . , ^ (V(m))) contains the nonzero singular 
values of -B( m ) , and 2? and =S are orthonormal matrices of dimensions mxm 
and H x H, respectively, which contain the singular vectors of B^ m y Note 

that, for brevity of notation, we leave out m in and n in -B( m ),V( m ) 

in this proof. 

Partition and B as ^ = [^il^], ^ = [Bi\B 2 ] where and 
both have Ki m ^ columns, and &2 and J?2 have m — 2f( m ) and H — if( m ) 
columns, respectively. Thus, the columns of &2 and are singular vectors 
corresponding to the singular value 0, and so B7^2 = and B^^ = 0. 
We further partition £?2 m the following way. Recall that fix,..., [in are the 
within-slice means defined in (2.10). By Theorem 2.1, span(/zi —//,..., — 
fj,) is a subspace of span(Tx/3i, • • • ,YxPk) and therefore has dimension less 
than or equal to K < Kq. It follows from the "rank-nullity theorem" that 
there exists a matrix £?2o of dimension H x (H — Kq) with orthonormal 
columns such that 

(A.l) (m -h,...,hh-ijl)* (FB 2o ) = 0. 

Furthermore, observe that g spans the null space of F and so (fii — fj,,..., hh — 
fj,) * (Fg) = 0. Without loss of generality, let g be the last column of =£?2o- 
Define an operator T : <¥f — > M m by 

(A.2) Tx = (o;~ 1/2 (^ 1 ,x),...,a;- 1 / 2 (V m ,x)) T , x e Jf. 

Applying T to both sides of (A.l), we have 

(A.3) B {m) B 2 o = MFB 2o = T{([ii — //,..., [in — /u)}F^ 2 o = 0, 

where, for convenience, the notation T{(//i — /i, . . . , fin — A 4 )} means (T(fi\ — 
[/,), . . . , T(fiH — /-0)- This means is contained in the column space of =22 • 
Without loss of generality, we assume that =£?2 has the decomposition 

(A.4) Jg 2 = [^2*|^2o], 

where £?2* is of dimension x (Kq — K/ m yj. When m is large enough so 
that K r m \ = Kq , then B2 = B20 ■ 
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Let 



(A.5) U n := ^B {m) £ 2 = (B (m) - B {m) )J2 2 . 

Also define 

(A.6) T\ (m) = — y^Vi^m) 1 ^ ^ Sh) and M = [0i,( m ), . . . ,^,( m )] mX H- 



Lemma 1. Assume that X(t) has an elliptically contoured distribution 

satisfying (2.7). Let M be defined by (A.6). We have y/n£PjMG 3TA, 
where 2? is a (m — Kr m j) x H matrix of independent Normal(0, 1) random 

variables, A = diag(r 1 1/ ' 2 , . . . , r^/ 2 ) with = E(0 2 |Y G Sh). 
Proof. Let 

j n 1 n 

U n ,h = ~^2 ^i,M J ( y i G S h), Pn,h = ~^2 I ( Y i G S h), 
n t=l n i=l 

then i\( m ) = Un th /p n>h , denote u h = E(u n)/l ) = & h ,{m)Ph- Then 

M = 

and 

M 

and so 



Pn,l Pn,H J mxH 



Ul Utf 

Pi PH/ mxH 



n x l 2 (M-M) 

,l/2^ u n,l u l u n,H U/f^ 



n 



Pn,l Pi Pn,.ff Ptf/ 

1/2 / u n,l ~ u l Un,H — Ufl 



j • ■ • 5 



Pn,l Pn,i? 



1/2/ u i / \ u # 

' (Pn,l -Pi),..., 

VPn.lPl P 

1/2 f U n,l ~ u l u n,g ~ 



(A.7) -n 1 (p„,i -pi),..., (p n ,H-PH. 

\Pn,lPl Pn,HPH 



Pi PH 



-n 1/2 ( "4(Pn,l -Pl),---,-^f(Pn,H ~Ph) ) 

VPi Pa- / 

+ o P (l). 
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By the central limit theorem and covariance computations, it is easy to 
see that the columns of y/n^^^M — M)G are asymptotically independent, 
where the /ith column converges in distribution to a random vector having 
the multivariate normal distribution 

(A.8) Normal(0,Var(^77 (m) |y £ S h )). 

For convenience, let Y denote the vector {(/3\,X), . . . , (/3k, X)). By iterative 
conditioning, 

Var(^ 2 T r7 (m) \Y £ S h ) = E{Var(^ 2 T r7 (m) \f , Q,Y,e)\Y £ S h } 

+ Var{ ^E(rj (m) \ V , 6, Y, e) \ Y £ S h } 
= E{Var(^ 2 T r7 (m) |r,e)|y£54 

+ Var{^ 2 T E(77 (m) |y,G)|y£5 ft }, 

where we used the facts that Y is redundant given e and the (/3fc, X)'s, and 
X is independent of e. With the notation Y = ((f3\, X), . . . , (/3k, X)), we 
have 

Var(^ 2 T r7 (m) |y £ S h ) = E{9 2 Var(^ 2 T *7 (m) |r)|y £ 5 h } 

(A.9) 

+ Var{e 2 E(^ 2 T f ?(m) |r)|y£54. 

In the following, we focus on the special case X is Gaussian. The general case 
is similar but requires a more careful analysis of the conditional distribution 
of jointly elliptically contoured random variables. Let b be any column of 
^» 2 - Then 

E(b T *7 (m) ) = and E(b T f) (m) (/3 k , X)) = b T b fci(m) = 0, l<k<K. 

Thus, &2'H{m) i s a vector of standard normal random variables that are 
independent of the (/3fc,X)'s. It follows from (A.9) that 

(A.10) Var(^ 2 T 77 (m) |y £ S h ) = E(e 2 \Y £ S h )I = r h I, 

where / is the identity matrix. The proof is complete. □ 

Lemma 2. Let X(t),£t and A be as in Lemma 1. Then ^fnXJ n — > Z 
where all of the entries of Z are normally distributed with mean zero and 
have the following properties: 

(i) // K {m) = K , then Z = 2?Af s J2 2 . 

(ii) If if( m ) < Kq, then Z can be partitioned as Z = [Z*\Z Q ] in accordance 

with the partition of £?2 in (A. 4), where Z = i2°A a f s ^2o! furthermore, if 
X is Gaussian then Z* and Z Q are independent, where the last column of Z Q 
is identically while the rest of the entries of Z Q are i.i.d. standard normal. 
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Proof. First, write 
n 1/2 U n = n 1/2 &> 2 T MF£ 2 

= n l / 2 ^{MF + (M — M)F + {M -M){F - F) + M(F - F)}£ 2 . 
Denote X h = Xd{Yi G S h )/ J2i I<Xi G S h ). Then 

M = {(QJ 1/ %,X h - X)}jf =v M = {(u- 1 %,X h - /i)}^. 
Then 

M-M = { (u,- l/ % - oj- 1 % ,X h -X) YtfL x 

(A.ll) 

-{(u; 1/2 ^X-»)}™£r 

It follows that 

^(t) - ^(t) = -^^((fx - T X )1>l,*l>j) + Cyn" 1 ), 

(A-12) 

Qj -Uj = ((T x -T x )^e,ipj) + O p (n ). 

These were established by (2.8) and (2.9) in Hall and Hosseini-Nasab (2006) 
for J4? = L 2 [a, b]. Actually, they hold for any Hilbert space Jif; see Eubank 
and Hsing (2010), Theorem 3.8.11. Since T x - T x = O^n' 1 / 2 ), these im- 
ply Qj = ujj + O p (n~ 1 / 2 ) and ^ = tpj + O p (n _1 / 2 ). Also X -fi = O p (n~ 1 / 2 ). 

Thus, M — M = (^(n" 1 / 2 ). Since we also have F — F = O p (n _1//2 ), we con- 
clude that 

n l l 2 ^{M -M){F-F) = O p {n- l l 2 ). 
Similarly, since &>JM = 0, 

n 1/2 ^jM(F — F) = n 1/2 ^ T (M - M){F - F) = O p (n~ 1/2 ). 

Thus, 

(A.13) v>l 2 U n = n 1/2 0$MF£ 2 + n 1/2 ^ 2 (M - M)F£ 2 + o p (l). 

To get the desired result, we break the proof into several parts. First, we 
establish that 

(A. 14) (n l l 2 ^(M - M)F£ 2 ,n l l 2 @> 2 (M -M)F£ 2 ) {Zi,Z 2 ), 

where (Z\,Z 2 ) are jointly normal with mean 0. By (A.ll) and the fact that 
(1,...,1)F = 0, we have 

n l l 2 {M - M)FJ2 2 = n x l 2 {l$f l % - ^ 1/2 ^,X h - X)}™f =1 F£2 2 . 
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Since ujj 1 ' 2 ^ - = O p (n -1 / 2 ) and X h - X fi h - fj,, 

nV\M-M)F3 2 = n^{(Qf% - uf'^n)}^-?™ 

(A.15) 

+ Gp(l), 

where 

(71 , . . . , 1H-K {m) ) = (ni-ii,...,iiH-p)* (F£ 2 )- 

Note that the last H — Kq of the 7fc's are equal to by (A.l). In particular, if 
K( m \ = Kq then all of the are equal to and (A. 14) is established with 

Z 2 = and Z x = 2TA& S £ 2 by Lemma 1. The assertion (i) follows readily 
from (A. 13). Below, we focus on the case if( m ) < Kq. Recall that 

{( w ; 1 %, 7fc >}j£^->=M^ = o > 

which implies that 

(A.16) m,7h}}T=i K<m) =°- 

By (A.15) and (A.16), 

(A.17) n 1 '\M-M)F£to = n 1 'H(uJ 1/ \$ j -fy),n^ 

By the central limit theorem, the random element n _1//2 (i? — R,u n h — 
u h,Pn,h — Ph-, h = 1, . . . , H) has a jointly Gaussian limit. In view of (A. 7), 
(A. 12) and (A.17), the claim in (A. 14) is established by performing a linear 
transformation. Define the partitions Z\ = [Z\*\Z\ \ and Z 2 = [Z 2if \Z 2o \ and 
so [Z*\Z ] = [Zu + Z 2 *\Zi ] since Z 2o = 0. By Lemma 1, 

(A.18) [Zu\Z l0 ] = \%kj % £ 2 *\2?k2r % £ 20 \ 

and so Z = Z\ = i2°A ^ s ^ 2o . Assume for the rest of the proof that X 
is Gaussian. Recall that = I — gg T . By the fact that = 1 and the 
convention that g is the last column of J2 2 , it follows from (A.18) that 

\Z\jf\Zio\ = \3f 'J3 2 *\3f J2 2o ], 

where J2 2o denotes the matrix whose last column contains 0's but the re- 
maining entries are taken after J2 2o . By Lemma 1, Zi* and Z\ are indepen- 
dent. So it remains to show that Z 2 * and Z\ are independent. By (A. 12) 
and (A.16), with j k£ := (7*.,^), 
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oo „ 

V / (R - R)Wj + o„(l) 



1/2 -1/2 Ikt 

e=m+i J 



oo f n 



(A.!9) =„V V ./ 2 E I E&6 ,U„ >(1 ) 

i=m+l J I i=l J 



n 



n / oo 

1/2 E E 

i=i \i 
■ z jk + o p (l), 



j=l V=m+1 J 



for j = l,...,m, fc = 1,...,-Rr - K (m) . Let z fc = (z lfc , . . . , z mk ) T . By (A.17) 
and (A.19), 

(A.20) n l ' 2 ^(M - M)F£ 2 * = ^ 2 T [zi, . . .,z Ko _ K(m) ] + o p (l). 
Since MF£H 2 = and 0*2 u h = °i it follows from (A.7) that 

(A.21) n^^{M - M)F£? 2o = n 1 / 2 ^ ( . . . , ^) F£> 2o + o p (l). 

V Pi PH ) 

We now compute the covariances between the components of (A.20) and 
(A.21). Since the components are jointly normal, our goal is to show that 
the covariances are all 0. Let q be a column of £} 2o and k = 1,. . . , Kq — Kr m \ . 
Note that 

n 



i=l 



where 

/ oo 



^fc = diagf ^ bJ li ^ t - £n>3 = l,...,m )■ 
Vi=m+1 

Since E(^Jz fc ) = 0, 



CJj — ^ 



Cov (n'/^T ^ ^ Fq ^T z ; 

■ l(,v(g!^, . . . , * (Fq)) . 

Let be as defined in the proof of Lemma 1. By the same conditioning 
argument employed there, 

(A.22) E(n 1 / 2 ^Ju n ^ h zJ^ 2 )=E[E{^Jr ]{m) r 1 J m) ^ 2 \r}I(Y G S/J], 
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where the sub-index i is suppressed from the symbols on the right-hand 
side since it suffices to deal with a generic process (X,Y) in computing 
expectations. Note that V(m) an d "V are independent, rj/ m \ and &k are 
independent, and ?7( m ) , V , 3>k are normally distributed with mean zero. Then 
it is easy to conclude from (A. 22) that 

E(n 1 / 2 ^ 2 T u„ ift z^ 2 ) 

(A.23) = E(^ 2 T rj {m) rjJ m) ^ 2 )E[E{^ 2 %^ 2 \r}I(Y G S h )} 

= E[E{& 2 @ k & 2 \y}I{Y eS h )). 

By the property of the normal distribution, each (diagonal) element of 
E{^k\^} can be written as ^2f=i Cj {(3j , X — /i) for some Cj,l < j < K. For 
convenience, denote E{S>k\Y} as T(X — /i) where T is a linear functional. 
Thus, 

(A.24) E[E{^J& k ^ 2 \r}I(Y e S h )} = Ph & 2 T T(» h - fj)& 2 . 
As a result, 

= (^ 2 T T(/i 1 - /i)^ 2 , . . . , &lT( m - ii)& 2 ) * (Fq) 

= ^ 2 T ((T( m -//),.. . ,T(// H - /i)) * (Fq))£* 2 = 0, 

by (A.l). This shows that the covariances between the components of (A. 20) 
and (A. 21) are all equal to 0, and concludes the proof that Z\ and Z 2if are 
independent. □ 

Proof of Proposition 3.2. Assume for convenience that Z\ has full 
column rank. If this is not the slight modification of the proof below 

suffices. Denote the jth column of Z as Zj, and construct orthonormal vec- 
tors by applying the Gram-Schmidt orthonormalization to the columns of 
Z: 

vi — |] — [7, Vj — — — — r — -, j — ,mm(p,q), 

Nil llu — n j-i) z jll 

where = [vi, . . . , Vj_i][vi, . . . , Vj-_i] T is the projection matrix to the 

space spanned by zi, . . . , z,_i. The following properties can be verified: 

(a) vjzfc = for all pairs k < j. This is the result of the construction of the 
v/s. 

(b) Vj is independent of z/% for k > max(j, r). This follows from the assump- 
tion on Z 2 . 
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(c) vjz/j ~ Normal(0, 1) for k > max(j, r). The proof of this is easy: by (b) 
and the fact that ||vj|| = 1, (vjzfcjvj) ~ Normal(0, 1); since this condi- 
tional distribution does not depend on Vj , it is also the marginal distri- 
bution. 

(d) vjzfc and vj,z k i are independent if k > max(j, r) and k! > max(j',r). 
The proof is as follows. First for the case j,j',k < k' , we have 

P(vjz fc < x,vj,z k , < y) =E[P(vJz fc < x,vj,z k , < y\vj,Vf,z k )] 

= E[7(vjz fe < x)F(v],z k/ < y\v f )] 
= $(V)$(y), 

where the last step follows from (c). Next for j,j' < k = k! and j ^ f, 
we have 

F(vjz fe < x,v],z k < y) =E[P(vJzfc < x,v],z k < y\vj,v r )] = 3>(x)$(y) 
since vjvj/ =0. 

(e) vjzj = || (7 — IIj_i)zj || for j > r + 1 is the square root of a Xp-j+i vari- 
able, and it is independent of any vj,z k > with k' > f > r. The claims 
can be easily verified using conditioning arguments similar to those in 
(c) and (d). 

(f) vjzj is independent of vJ,Zj/, for j,f > r + 1 and j ^ f. This can be 
verified by checking the independence between (7 — Ilj_i)zj and (7 — 

n f _ 1 )z / . 

Based on (a)-(f), we conclude that the entries in [vi, . . . , v min ( p (? )] T Z' have 
the following properties: all entries below the diagonal are zero; all entries 
in the last q — r columns and on and above the diagonal are independent, 
where those above the diagonal are distributed as standard normal and the 
square of the jth diagonal element is distributed as Xp-j+i- 

Notice that if p < q, the v^-'s defined above already constitute a basis for 
W. If p > q, we can define Vj, j = q + 1, . . . ,p, such that they are orthogonal 
to all columns of Z, and to each other. Define V r = [v r +i, . . . , v p ]. By the 
nature of eigenvalues, 

j=r+l 

= min{tr($ T ZZ T $), 
(A.25) * 

$ is a p x (p — r) matrix with orthonormal columns} 

v 

<tr(V r T ZZ T V r )= E vJZ 2 Z 2 T Vj . 

j=r+l 
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It follows from the summary above that the last expression is a sum of 
independent y 2 random variable, and a simple calculation shows that the 
total degrees of freedom is (p — r)(q — r). □ 

PROOF of Theorem 3.1. To study the smallest m- Kq eigenvalues of 
, we can equivalently study the smallest squared m — Kq singular values 

of -B( m ) . By the asymptotic theory described in Dauxois, Pousse and Romain 
(1982) and Hall and Hosseini-Nasab (2006), it is straightforward to show that 
y/n(B( m ^ — -B( m )) converges in distribution. By Theorem 4.1 in Eaton and 
Tyler (1994), the pairwise difference between the smallest m — -ftT( m ) singular 
values of -B( m ) and the singular values of U n = &2 B{m)&2 is O p (n -3 / 4 ). So, 

for K( m ) = Kq , the smallest m — Kq eigenvalues of V( m ) are approximated by 
the complete set of eigenvalues of U n U^, while, for -fT( m ) < Kq, the smallest 
m — Kq eigenvalues of V( m ) are only approximated by a subset of eigenvalues 
of U n Un ■ We consider the two cases in more details below. 

(i) For i^( m ) = Kq, we will prove (3.4) from which (3.2) follows easily. It 
follows that i?2 = =£?2o and 

^ Ko ,( m )=ntT{U n U^) + o p {l). 

By (i) of Lemma 2, 

(A.26) Atr(iTAHAir T ), 

where is as given in Lemma 1 and E := ^gii^^J ^g- It is easy to see 
that ^ g and J?2=Sj are projection matrices with rank H — 1 and H — Kq, 
respectively. Since g is a column of £?2, we have J^i^Jg = g. As a result, 

S = / g J 2 J 2 T A = ^ 2 T - gg T , 

which is a projection matrix with rank and trace equal to H — 1 — Kq. Since 
A is full rank, Rank(AHA) = Rank(H) = H - K - 1. Let AAA T be the 
eigen decomposition of AHA where the column of A are the orthonormal 
eigenvectors of AHA and A = diag{<5i, . . . , <5e_j^ _i} contains the positive 
eigenvalues. Write 

m—Ko m—Ko m—KoH—Ko—1 

tr(ZAEAZ T ) = £ z.AHAz^ = £ Zl AAA? z J = £ E 

i=l i=l i=l k=l 

where z% is the ith row vector of Z, and Xik ^ s the ^th element of ZjA 
Clearly, the Xik are i-i-d. x 2 random variables with degree 1. 
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(ii) For if ( m ) < Kq , it follows that 

m 

j=K Q +l 

m-K {m) 
j=K -K {m) + l 

by Lemma 2. Since the last column of Z is identically zero, the last expression 
is equal to 

Aj{(Z*,Zo ^){Z*,z\ ') ), 

j=K Q -K (m) +l 

where z\ ^ denotes the matrix Z Q minus the last column. We apply Propo- 
sition 3.2, with Z\ = Z* , Z2 = z\ 1 ,p = m — Ki m ^,q = H — Kt m ) — 1 and 
r = Kq — Kr m \ , to obtain the desired result. □ 

A.2. Proof of Theorem 3.3. 

Lemma 3. Wr m \ has the same column space as Br m y 

Proof. By definition, Wi m ) = B^A(A ^/gA) - , therefore the column 
space of is contained in that of Br m \ . Suppose the column rank of Wi m ) 
is strictly less than that of Bi m \ . Then there exits a nonzero vector x £ W 71 
such that x T W (m ) = but x t B (to) / 0. Since B {m) g = 0, B {m) / S = B {m) 
and so 

(A.27) = x T W (m) =x T i? (m) A- 1 (A^ g A)(A^ g A)-. 

Observe that A _1 g spans the null space of A J?^A. Since x T £>( m ) 7^ 0, we 
conclude that x T i?(, m )A~ 1 = (^A^g) 1 " for some constant 5. Thus, x T £>( m ) = 
<5g T . Since £?( m )g = 0, it follows from (A.27) that ||x T I?( m ) || 2 = x T ' B^BT^x 

Sg^BT^x = 0, which leads to a contradiction to the assumption that x T .B( m ) 
0. The only possibility left is that the column space of Wr m ) is the same as 

B( m ) ■ □ 

PROOF of Theorem 3.3. As mentioned in the proof of Theorem 3.1, 
= JB( m) + O p (n~ 1 / 2 ), which leads to V( m) = V (m ) + O p (n~ 1 / 2 ) and ^2^2 
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+ O p {n~ 1 / 2 ). By (3.5) and (A.10), we have % = r h + C^n" 1 / 2 ). 

Therefore, Wc m \ is a root n consistent estimator of W( m y The rest of the 
proof will follow the same general structure as that of (i) of Theorem 3.1. 
Suppose Wt m ) nas the singular-value decomposition 

where M and 5? are, respectively, mx m and H x H orthonormal matrices, 

~- 1/2 1/2 

and D = diag(A 1 (£( m )), . . . , X^ ^ (£( m ))). As before, consider the partition 

M = \£$\\2% 2 \ and y = \Sf\\S?-j\ where 3t\ and S?\ have K {m) columns, and 
£%2 and 5^ 2 have m — Kr m \ and H — K( m ) columns, respectively. By Lemma 3, 
B( m ) and VF( m ) have the same column space, therefore we can take £% 2 = &2 
without loss of generality. Similar to the definition of £?2, we proceed to 
construct 5^ 2 . Again, since span(/xi — fi, . . . , — A*) has dimension less than 
or equal to Kq, there exists a matrix 5? 2a with dimension H x {H — Kq) and 
orthonormal columns such that 

(jii-H,...,HH-n)* (FA(A^ g A)-y 2o ) = 0. 

Observe that (/ii — fi, ... , fin — [i)*FA(A c /'gA)~A~ 1 g = since A _1 g spans 
the null space of A ^f s A. Without loss of generality, let A _1 g be the last 
column of S^o- Let T be as defined in (A. 2). As in (A. 3), we obtain 

W {m) y 2o = MF A{A/ s A)-y 2o 

= T{(jii-fi,...,HH- fi)}FA(A^ s A)-y 2o 
= 0. 

Since Kr m ) = Kq, we can, and will, take y 2 to be S^ 2o . Again, by Theorem 
4.1 in Eaton and Tyler (1994), the smallest m — Kr m \ singular values of 

Wf m ) are asymptotically equivalent to those of U* := & 2 W( m yy 2 , so that 
we have 

^K , (m) =ntv{U*(U:) T } + o p (l). 
Let & = G J % A(A / S A)-, J^ = G figA{A /§A)~ ■ Similar to Lemma 2, 
nVSf/* = n V2^Tjj^y 2 = n y 2 {^M^^ 2 + &>2(M- M)^y 2 } + o p (l). 

By arguments similar to those in the proof of Lemma 2, we haven 1 / 2 ^'J(M- 
M)&y 2 = 0p {l). Let E* = A f s A(A A A )" ^2^2 '(A f^A)~ A J % A. Thus, 

^K ,(m) = ntri^M&y^^M^^) + Op(l) A tv(ZE*Z T ) 
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by Lemma 1, where Z is (to — Kq) x H matrix with independent Normal(0, 1) 
entries. By the fact that A ^ g A(A ^fgA) - is a projection matrix with only 
one null vector A _1 g and the assumption that A~ x g is a column of S2, it 
easily follows that H* is a projection matrix with trace equal to H — 1 — Kq. 
Therefore, tr(ZH*Z T ) is distributed as x\ m - Ko ) x ( H -K -i)- □ 

A. 3. Proof of Theorem 3.4. Some of the variables and matrices intro- 
duced in earlier sections depend on to, and we will add the subscript r m \ 
to those quantities to emphasize this dependence in the proof. Consider the 
singular- value decomposition 



(m) — ^(m) I q q J ' 

where ^(m) an d S( m ) have the same partition as before (see the proof 
of Theorem 3.1): ^ (m) = [^i,( m )l^2,( m )] and £ {m) = [=Si,( m )|^ 2 ,(m)]- The 
nonuniqueness of ^2,(m) an d &i,{m) allows us to construct ^2,(m) anc l 
=S 2 ,(m) i n a particular way, as follows. It will be easier to think about the 
case where if( m ) = Kq for m large enough. We will henceforth make this as- 
sumption even though it is not necessary for the result to hold. Thus, there 
exits an ascending sequence < mi < m 2 < • • • < rriK < oo, such that 

rrij = min{?n, K {m) >j}, j = 1, . . . , K , 

which are the instances where the rank of Br m ) changes. 

We first construct =S 2) ( m ) whose columns span the null row space of -B( m ) . 
Let £?2o be as in the proof of Theorem 3.1. Define a sequence of orthonormal 
vectors qj, j = 1, . . . , Kq by backward induction: 

T 



B (m K() -i)<iKo = and £ 2o qjc = 



B( mj -i)<lj — 



and 



[qj+i,---,q^ ,^2o] T qj =0, j = K - 1,...,2; 
[q 2 ,...,q^ ,^ 2 o] T qi =0. 
Such a sequence of qj's clearly exist. Define 

(A.28) £> 2 (m) = { [J^W +i> ■ ■ ■ , q*o, . ™ < , 

V ' ' ( j \^2o, TO>TO^ . 

Thus, =S 2 ,(m+l) = ^2,(m) if ^(m+i) = ^(m) , otherwise =S 2 ,(m+l) equals =S 2 ,(m) 
minus the first column. 

We next construct ^2,(m)> a matrix of dimension to x (to — Kr m j), whose 
columns generate the null column space of Bf m y To do that, we start with 
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m = Kq + 1 for which we will just make an arbitrary choice of ^2,(m) that 
works. Suppose we have defined <^2,(m) f° r some m. If if( m+1 ) = K^ + 1, 
let 

<^2,(m+l) 



^2,(m) 



if ^( m+ i) = K( m ) , let 

/ <^2(m) \ 

^2,(m+i) = (^2i,(m+i), v m+ i) where ^ 2 l,(m+l) = ( n T ' 



(A.29) 

where v m+ i is a new null singular column vector in R m+1 . Thus, a whole 
sequence of ^2,(m) can be defined recursively in this manner. 

We now briefly summarize some of the key points in the proof of Theorem 
3.1. For each m > Kq + 1, there exists a Gaussian random matrix Zi m \ such 
that 



(A.30) 



j=K -K (m) +l 



if we write Z( m ) = [Z^i m ^\Z r m -A where Z*{ m \ contains the first Kq — if( m ) 
columns of Z( m ), then Z^i m \ is independent of Z j m \, and Z a i m \ contains 
independent Normal(0, 1) random variables except the last column which 
contains zeros. The proof of Proposition 3.2 shows that there exist orthonor- 
mal vectors <j>i^ m ), ■ ■ ■ , <Pm-K ,(m) m M m ~ K( - m '> that are orthogonal to the 
columns of ^*r m ) such that 

m— Kq 

( t > j,{m) Z (m) Z (m) ( t > j,(m) ~ X( m -K )x(H-K -l)- 

i=i 

Note that the </>j- r m \'s are obtained by relabeling the Vj's in that proof. By 
(A.30) and (A.31), using the notion of (A. 25), we conclude that ^K ,(m) 
is asymptotically bounded by 3si m \- Thus, we have the desired stochastic 
bound for the first term, m = Kq + 1, but so far there is nothing new. 
To define ^( m+ i), we proceed in a similar manner by identifying a set of 
orthonormal vectors 4>j,( m +i)->j = 1, - • - , (wi + 1) — Kq. As we will see, the 
specific choice of =^2,(m) an d &2,(m) that was made enables us to directly 
relate Z^ and Z( m+1 ) in a probability space, and, consequently, the two 
bounds as well. 

Consider the two situations K^ m+1 ^ = K^ + 1 and K( m+1 \ = K( m \ sep- 
arately. 
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Case 1: K/ m+ i\ = Kr m -\ + 1. In view of the relationship between i^2,(m)j 

^2,(m)) and (<^2,(m+l)> ^2,(m+l))> it: is eas y to see tliat ^(m+l) is ec l ual to 
Z( m \ less the first column. Below we denote the kih column of Z*{ m \ by 

z fc,(m)- Define 

0j,(m)j 
(J-iI)z 1|(m ) 







j,(m+l) 



j = l,...,m- K , 



j = ( m + 1) - JST 



(i - n)z lj(m ) | 

where II is the projection matrix onto span{z 2j ( m ) , . . . , ^K -K {m) ,(m)]- Ob- 
serve that the vectors 4>ji m -\-i)i3 = 1, . . . , (ttt, + 1) — Kq, are orthonormal. 
Define 

(m+l)-K 

(A.32) iT( m+1 ) = ^2 0£(m+l)^(m41)^m+l)0j,(m+l) = + % > 

where = ^ m+1) _^ Oi(m+1) Z (m+1) Z T n+1) (m+1) _^ Oj(m+1) . Note that 

0(m+l)-Ko,(m+l) G s P an " L { z 2,(m), • • • j Z K -K (m) ,{m)} 

and is independent of Z r m y The conditioning arguments in the proof of 
Proposition 3.2 can be applied to conclude that S£ ~ Xh-k -i ano - * s ^de- 
pendent of 3£i m )- As a result, ^K ,(m) an d ^k ,(m+l) are jointly asymptoti- 
cally bounded by 3£{ m \ and 3£( m \ + JT. 



Case 2: K, 



(m+l) 



K( m ) . In this case, 



Z 



(m+l) 



W 



(m) 
T 



where w T is the limit of v^ +1 B( m+1 )^ 2 ,(m); se e (A. 29). Define 







3, (m+l) 



4>j,(m) 









(/-n; 


e 



l,...,m-K , 



(m + l) 



|(/-n)e||' 

where, in this case, II is the projection matrix onto the column space of 
Z*.(m.+i) and e = (0% l _ K ) A) T - Define 3£(m-\-i) as i n (A.32) and the same 
jointly asymptotic bound can be concluded for ^K ,(m) an d ^K ,(m+i)i as 
in the previous case. 

These construction steps can be implemented recursively, thereby com- 
pleting the proof of Theorem 3.4. 

Acknowledgments. We are grateful to the Associate Editor and two ref- 
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